12 research outputs found

    Geometry-Based Next Frame Prediction from Monocular Video

    Full text link
    We consider the problem of next frame prediction from video input. A recurrent convolutional neural network is trained to predict depth from monocular video input, which, along with the current video image and the camera trajectory, can then be used to compute the next frame. Unlike prior next-frame prediction approaches, we take advantage of the scene geometry and use the predicted depth for generating the next frame prediction. Our approach can produce rich next frame predictions which include depth information attached to each pixel. Another novel aspect of our approach is that it predicts depth from a sequence of images (e.g. in a video), rather than from a single still image. We evaluate the proposed approach on the KITTI dataset, a standard dataset for benchmarking tasks relevant to autonomous driving. The proposed method produces results which are visually and numerically superior to existing methods that directly predict the next frame. We show that the accuracy of depth prediction improves as more prior frames are considered.Comment: To appear in 2017 IEEE Intelligent Vehicles Symposiu

    Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos

    Full text link
    Learning to predict scene depth from RGB inputs is a challenging task both for indoor and outdoor robot navigation. In this work we address unsupervised learning of scene depth and robot ego-motion where supervision is provided by monocular videos, as cameras are the cheapest, least restrictive and most ubiquitous sensor for robotics. Previous work in unsupervised image-to-depth learning has established strong baselines in the domain. We propose a novel approach which produces higher quality results, is able to model moving objects and is shown to transfer across data domains, e.g. from outdoors to indoor scenes. The main idea is to introduce geometric structure in the learning process, by modeling the scene and the individual objects; camera ego-motion and object motions are learned from monocular videos as input. Furthermore an online refinement method is introduced to adapt learning on the fly to unknown domains. The proposed approach outperforms all state-of-the-art approaches, including those that handle motion e.g. through learned flow. Our results are comparable in quality to the ones which used stereo as supervision and significantly improve depth prediction on scenes and datasets which contain a lot of object motion. The approach is of practical relevance, as it allows transfer across environments, by transferring models trained on data collected for robot navigation in urban scenes to indoor navigation settings. The code associated with this paper can be found at https://sites.google.com/view/struct2depth.Comment: Thirty-Third AAAI Conference on Artificial Intelligence (AAAI'19

    A phase III, randomized, two-armed, double-blind, parallel, active controlled, and non-inferiority clinical trial to compare efficacy and safety of biosimilar adalimumab (CinnoRA (R)) to the reference product (Humira (R)) in patients with active rheumatoid arthritis

    Get PDF
    Background: This study aimed to compare efficacy and safety of test-adalimumab (CinnoRA (R), CinnaGen, Iran) to the innovator product (Humira (R), AbbVie, USA) in adult patients with active rheumatoid arthritis (RA). Methods: In this randomized, double-blind, active-controlled, non-inferiority trial, a total of 136 patients with active RA were randomized to receive 40 mg subcutaneous injections of either CinnoRA (R) or Humira (R) every other week, while receiving methotrexate (15 mg/week), folic acid (1 mg/day), and prednisolone (7.5 mg/day) over a period of 24 weeks. Physical examinations, vital sign evaluations, and laboratory tests were conducted in patients at baseline and at 12-week and 24-week visits. The primary endpoint in this study was the proportion of patients achieving moderate and good disease activity score in 28 joints-erythrocyte sedimentation rate (DAS28-ESR)-based European League Against Rheumatism (EULAR) response. The secondary endpoints were the proportion of patients achieving American College of Rheumatology (ACR) criteria for 20% (ACR20), 50% (ACR50), and 70% (ACR70) responses along with the disability index of health assessment questionnaire (HAQ), and safety. Results: Patients who were randomized to CinnoRA (R) or Humira (R) arms had comparable demographic information, laboratory results, and disease characteristics at baseline. The proportion of patients achieving good and moderate EULAR responses in the CinnoRA (R) group was non-inferior to the Humira (R) group at 12 and 24 weeks based on both intention-to-treat (ITT) and per-protocol (PP) populations (all p values >0.05). No significant difference was noted in the proportion of patients attaining ACR20, ACR50, and ACR70 responses in the CinnoRA (R) and Humira (R) groups (all p values >0.05). Further, the difference in HAQ scores and safety outcome measures between treatment arms was not statistically significant. Conclusion: CinnoRA (R) was shown to be non-inferior to Humira (R) in terms of efficacy at week 24 with a comparable safety profile to the reference product

    Robotic Table Tennis: A Case Study into a High Speed Learning System

    Full text link
    We present a deep-dive into a real-world robotic learning system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets. This system puts together a highly optimized perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real world and also train policies for zero-shot transfer, and automated real world environment resets that enable autonomous training and evaluation on physical robots. We complement a complete system description, including numerous design decisions that are typically not widely disseminated, with a collection of studies that clarify the importance of mitigating various sources of latency, accounting for training and deployment distribution shifts, robustness of the perception system, sensitivity to policy hyper-parameters, and choice of action space. A video demonstrating the components of the system and details of experimental results can be found at https://youtu.be/uFcnWjB42I0.Comment: Published and presented at Robotics: Science and Systems (RSS2023

    Revisiting Multi-Scale Feature Fusion for Semantic Segmentation

    Full text link
    It is commonly believed that high internal resolution combined with expensive operations (e.g. atrous convolutions) are necessary for accurate semantic segmentation, resulting in slow speed and large memory usage. In this paper, we question this belief and demonstrate that neither high internal resolution nor atrous convolutions are necessary. Our intuition is that although segmentation is a dense per-pixel prediction task, the semantics of each pixel often depend on both nearby neighbors and far-away context; therefore, a more powerful multi-scale feature fusion network plays a critical role. Following this intuition, we revisit the conventional multi-scale feature space (typically capped at P5) and extend it to a much richer space, up to P9, where the smallest features are only 1/512 of the input size and thus have very large receptive fields. To process such a rich feature space, we leverage the recent BiFPN to fuse the multi-scale features. Based on these insights, we develop a simplified segmentation model, named ESeg, which has neither high internal resolution nor expensive atrous convolutions. Perhaps surprisingly, our simple method can achieve better accuracy with faster speed than prior art across multiple datasets. In real-time settings, ESeg-Lite-S achieves 76.0% mIoU on CityScapes [12] at 189 FPS, outperforming FasterSeg [9] (73.1% mIoU at 170 FPS). Our ESeg-Lite-L runs at 79 FPS and achieves 80.1% mIoU, largely closing the gap between real-time and high-performance segmentation models
    corecore